Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: prevent deadlock in transport Close when GoAway write hangs #7662

Merged
merged 3 commits into from
Oct 8, 2024

Conversation

aranjans
Copy link
Contributor

@aranjans aranjans commented Sep 23, 2024

Fixes #7606.

Couple of recent changes worth noting here:

  • Send GOAWAY to server on Client Transport Shutdown #7015 added logic to send a GOAWAY frame on client transport shutdown. Unfortunately the handler responsible for writing the GOAWAY frame was holding on to the client transport mutex when attempting to write the GOAWAY frame.
  • transport: add timeout for writing GOAWAY on http2Client.Close() #7371 introduced a timeout in the client transport shutdown handler to wait for loopyWriter to exit (after enqueueing the above GOAWAY frame on the controlbuf). This was done to ensure that the client transport shutdown can complete in the face of a hanging network connection that blocks forever when attempting to write the above GOAWAY frame

Description of the deadlock:

  • During client transport shutdown, after enqueueing the GOAWAY frame on the controlbuf, http2Client.Close calls http2Client.GetGoAwayReason to fetch the last GOAWAY's debug message, and the latter attempts to grab http2Client.mu.
  • http2Client.outgoingGoAwayHandler holds http2Client.mu when it is attempting to write the GOAWAY frame. So, if the underlying network connection is hanging, this method will not release the mutex, and therefore http2Client.GetGoAwayReason will not be able to grab the same mutex, and thereby http2Client.Close will deadlock.

RELEASE NOTES:

  • transport: prevent deadlock in client transport shutdown when writing the GOAWAY frame hangs.

@aranjans aranjans added the Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. label Sep 23, 2024
@aranjans aranjans added this to the 1.68 Release milestone Sep 23, 2024
Copy link

codecov bot commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.96%. Comparing base (1418e5e) to head (4eaf6ab).
Report is 22 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7662      +/-   ##
==========================================
+ Coverage   81.89%   81.96%   +0.06%     
==========================================
  Files         361      361              
  Lines       27818    27823       +5     
==========================================
+ Hits        22782    22805      +23     
+ Misses       3847     3833      -14     
+ Partials     1189     1185       -4     
Files with missing lines Coverage Δ
internal/transport/http2_client.go 92.12% <100.00%> (+0.50%) ⬆️

... and 28 files with indirect coverage changes

@aranjans aranjans changed the title transport/http2_client: fixed flaky test Test/ClientCloseReturnsEarlyWhenGoAwayWriteHangs transport/http2_client: fixed Test/ClientCloseReturnsEarlyWhenGoAwayWriteHangs Sep 23, 2024
@aranjans aranjans requested review from arvindbr8 and removed request for arvindbr8 September 24, 2024 16:59
@purnesh42H
Copy link
Contributor

Is there an existing issue filed which needs to be linked?

@aranjans
Copy link
Contributor Author

@purnesh42H yes, added it in description.

@purnesh42H purnesh42H self-assigned this Sep 30, 2024
@aranjans aranjans changed the title transport/http2_client: fixed Test/ClientCloseReturnsEarlyWhenGoAwayWriteHangs transport: fixed deadlock happening due to http2Client.mu Oct 1, 2024
@easwars
Copy link
Contributor

easwars commented Oct 1, 2024

Some general comments here:

  • Can you please use bullet points in the PR description to make it more readable. Also, use backticks wherever appropriate to highlight code symbols
  • The release note (and the PR title) needs to be more descriptive. Saying that a deadlock is happening due to http2Client.mu does not provide any useful information for the user.

internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
@easwars easwars assigned aranjans and unassigned purnesh42H Oct 1, 2024
@aranjans aranjans changed the title transport: fixed deadlock happening due to http2Client.mu transport: prevent deadlock in transport Close when GoAway write hangs Oct 3, 2024
@aranjans aranjans assigned easwars and unassigned aranjans Oct 3, 2024
@easwars easwars assigned aranjans and unassigned easwars Oct 7, 2024
Copy link
Contributor

@purnesh42H purnesh42H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@purnesh42H purnesh42H merged commit d365be6 into grpc:master Oct 8, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: Test/ClientCloseReturnsEarlyWhenGoAwayWriteHangs
3 participants